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Abstract 

Background: Lymph node status is not part of the staging system for cervical cancer, but provides important 
information for prognosis and treatment. We investigated whether lymph node status can be predicted with 
proteomic profiling. 

Material & methods: Serum samples of 60 cervical cancer patients (FIGO l/ll) were obtained before primary 
treatment. Samples were run through a HPLC depletion column, eliminating the 14 most abundant proteins 
ubiquitously present in serum. Unbound fractions were concentrated with spin filters. Fractions were spotted onto 
CMIO and IMAC30 surfaces and analyzed with surface-enhanced laser desorption time of flight (SELDhTOF) mass 
spectrometry (MS). Unsupervised peak detection and peak clustering was performed using MASDA software. 
Leave-one-out (LOO) validation for weighted Least Squares Support Vector Machines (LSSVM) was used for 
prediction of lymph node involvement. Other outcomes were histological type, lymphvascular space involvement 
(LVSI) and recurrent disease. 

Results: LSSVM models were able to determine LN status with a LOO area under the receiver operating 
characteristics curve (AUG) of 0.95, based on peaks with m/z values 2,698.9, 3,953.2, and 15,254.8. Furthermore, we 
were able to predict LVSI (AUG 0.81), to predict recurrence (AUG 0.92), and to differentiate between squamous 
carcinomas and adenocarcinomas (AUG 0.88), between squamous and adenosquamous carcinomas (AUG 0.85), and 
between adenocarcinomas and adenosquamous carcinomas (AUG 0.94). 

Conclusions: Potential markers related with lymph node involvement were detected, and protein/peptide profiling 
support differentiation between various subtypes of cervical cancer. However, identification of the potential 
biomarkers was hampered by the technical limitations of SELDI-TOF MS. 
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Background 

Cervical cancer is the seventh most common cancer in 
both sexes combined and the third most common can- 
cer in women. An estimated 530,000 women across the 
world were diagnosed with cervical cancer in 2008, 
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accounting for nearly one in ten (9%) of all cancers diag- 
nosed in women. The developing countries carry the 
biggest burden of cervical cancer, with more than 
450,000 cases being diagnosed in 2008 [1]. 

Lymph node (LN) status is not part of the staging sys- 
tem of the International Federation of Gynecology and 
Obstetrics (FIGO) for cervical cancer [2], but it provides 
important information for prognosis and treatment, in 
particular in early stage cervical cancer [3,4]. The inci- 
dence of pelvic LN metastases varies from 0-2% in 
FIGO stage lA, 17-24% in FIGO stage IBl, 17-50% in 
FIGO stage IB2, and 10-50% in FIGO stage Ila [4-10]. 
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In patients with early stage cervical cancer, the treat- 
ment of choice is either surgical, including radical hys- 
terectomy and pelvic LN dissection, or chemoradiation. 
Combining both treatments leads to a higher morbidity, 
such as lymph edema and urological complications [11]. 
Specifically for patients with lymph node metastases, 
chemoradiation is the treatment of choice since it 
reduces local and distant recurrences [12]. Preoperative 
diagnostic modalities such as CT scan and MRI have a 
good specificity, but a low sensitivity [13,14]. This 
explains why a certain number of patients, in whom the 
diagnosis of positive LN is only made after pathological 
examination, still receive a combined treatment of sur- 
gery and pelvic irradiation. 

Various proteomics techniques have been used to de- 
tect new biomarkers in gynaecological cancers with vari- 
able degrees of success [15]. Over the last decade, 
surface-enhanced laser desorption time of flight (SELDI- 
TOF) mass spectrometry (MS) has been a popular pro- 
teomics technique due to its ease of use and high 
throughput. Several studies have published comparative 
studies on new diagnostic proteins [15]. 

We investigated whether we could improve the predic- 
tion of LN involvement with SELDI-TOF MS proteomic 
profiling. 

Results 

Patients 

Patient and tumour characteristics are represented in 
Table 1. Twelve patients were diagnosed with positive 
LNs. The remainder of the patients had a complete lym- 
phadenectomy performed, but no positive lymph nodes 
were diagnosed. Both groups were well balanced for age, 
FIGO stage, histological subtype, number of removed 
LNs, incidence of LVSI, duration of follow-up and inci- 
dence of recurrence. LVSI was — as expected — associated 
with LN status. 

Unsupervised peal< detection 

In total 597 different peaks were detected in our panel 
of 60 samples: 284 peaks on CMIO and 313 on IMAC30. 
In Table 2 the number of peaks that was differentially 
expressed according to LN status, histological subtype, 
LVSI and recurrence of disease are shown. In general, 
the number of differentially expressed peaks was higher 
in the low mass range, except for the difference between 
squamous carcinomas and adenocarcinomas. The total 
number of differentially expressed peaks ranged from 11 
to 37, depending on the comparison which was made. A 
complete list of the m/z values of the differentially 
expressed peaks with corresponding p-values is provided 
in Additional file 1. 



LOO internal validation for weighted LSSVM 

The AUC values obtained by LOO internal validation 
with the optimal median and mean number of peaks 
across all LOO iterations are represented in Table 3. For 
the prediction of LN status an AUC value of 0.95 was 
obtained (Figure 1). Three peaks were repeatedly 
selected in the LOO iterations: m/z values 2,698.9, 
3,953.2, and 15,254.8 from the IMAC low mass, CMIO 
low mass, and IMAC high mass spectra, respectively 
(Table 4). 

LVSI was more difficult to predict. Although a median 
number of one peak was sufficient, the LOO AUC 
reached only a value of 0.81. A median number of 1 
peak was needed to construct a model that was able to 
differentiate squamous carcinomas with adenocarcin- 
omas (AUC 0.88), 4 peaks to differentiate between squa- 
mous and adenosquamous carcinomas (AUC 0.85), 1 
peak to differentiate between adenocarcinomas and ade- 
nosquamous carcinomas (AUC 0.94), and 3 peaks to 
predict recurrence (AUC 0.92). The most frequently 
selected peaks for the different comparisons are repre- 
sented in Table 4. 

Discussion 

This study investigated whether we could improve the 
prediction of LN involvement with proteomic profiling. 
We used a combination of HPLC immunodepletion with 
SELDI-TOF MS to detect proteins that predict LN in- 
volvement. Using LSSVM models we were able to pre- 
dict lymph node involvement with an AUC of 0.95. 
These findings suggest that serum biomarkers could 
help us identifying patients with LN metastases. Other 
outcomes, such as histological type (AUC = 0.85-0.94), 
lymph vascular space involvement (AUC = 0.81) and re- 
currence (AUC = 0.92), were also successful, however the 
number of patients in some of the subgroups was lim- 
ited (e.g. adenosquamous subtype (n = 2)) making the 
results less reliable. 

The majority of serum proteins are high-abundance pro- 
teins, accounting for almost 99% of the total protein mass 
[16]. Most of these proteins are true serum or plasma pro- 
teins that carry out their functions in the circulation, rather 
than proteins or peptides that leak into the blood (e.g. 
tumor tissue proteins) [16]. Removing the high abundant 
proteins facilitates the discovery and identification of low- 
abundance proteins that may be biomarkers [17]. The 
MARS-14 immunodepletion column used in the present 
study removes 95-99% of the 14 most abundant proteins 
from serum, thereby increasing the likeliness of finding pos- 
sible biomarkers [18,19]. This technique has proven to be 
highly reproducible [19]. However, due to protein-protein 
or protein-antibody interactions also non-targeted proteins 
are being removed [19,20] which could hamper the 
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Table 1 Patient and tumour characteristics 





Numerical display 


LN positive 
(n = 12) 


LN negative 
(n = 48) 


Test 


P value 


Age in years 


Mean 


(95%CI) 


45.8 (38.5-53.0) 


46.7 (43.3-50.0) 


r-test 


0.732 


FIGO stage 














Ia2 


n 


(%) 


0 (0.0) 


2 (4.2) 




0.134 


Ibl 


n 


(%) 


6 (50.0) 


37 (77.1) 






Ib2 


n 


(%) 


2(16.7) 


2 (4.2) 






Ha 


n 


(%) 


4 (33.3) 


7 (14.6) 






Histological subtype 














Squamous cell carcinoma 


n 


(%) 


11 (91.7) 


29 (60.4) 


/ 


0.119 


Adenocarcinoma 


n 


(%) 


1 (8.3) 


17 (35.4) 






Adenosquamous carcinoma 


n 


(%) 


0 (0.0) 


2 (4.2) 






Lymph nodes 














Number of positive LN 


Median 


(min-max) 


1 (1-7) 


0 (0-0) 


Mann-Whitney 


<0.001 


Number of removed LN 


IVledian 


(min-max) 


28 (4-50) 


34 (18-89) 


Mann-Whitney 


0.241 


LVSI 














Positive 


n 


(%) 


10 (83.3) 


16 (33.3) 


/ 


0.005 


Negative 


n 


(%) 


2(16.7) 


32 (66.7) 






Follow-up 














Follow-up (in months) 


Mean 


(95%CI) 


61.8 (42.0-81.5) 


61.7 (49.0-66.5) 


7"-test 


0.997 


Recurrence 














Recurrence 


n 


(%) 


3 (25.0) 


7 (14.6) 




0.555 


No recurrence 


n 


(%) 


9 (75.0) 


41 (85.4) 







Abbreviations: LN = lymph node; FIGO = International Federation of Gynecology and Obstetrics; LVSI = lymphvascular space involvement. 



detection of certain proteins. Moreover, some reports men- 
tion that the detection of medium abundance proteins 
improves, but not the detection of the very low abundance 
proteins (<10 ng/mL) [18]. This is the range in which some 
of the currently known biomarkers are found (e.g. CEA) 
[16]. Another problem with immunodepletion in combin- 
ation with SELDI-TOF MS is that both systems, the HPLC 
and SELDI-TOF MS are not in-line as other LC-MS techni- 
ques. The additional sample handling introduces additional 
experimental variables, such as additional freezing/thawing 
cycles, and manually handling of the samples. 

Upon establishing the biomarker profiles for lymph node 
involvement in cervical cancers, it became interesting to 
identify the proteins behind the differentially expressed 
pealcs. For the 15,254.8 peal< detected on the IMAC30 chip, 
an approach was developed using immunodepletion and 
SDS-PAGE gel electrophoresis as initial separation steps. 
Unfortunately, due to the apparently very low concentra- 
tion of this protein in serum, no Coomassie Blue band 
could be detected at the level of 15-16 kDa. For the two 
lower masses (2,698.9 and 3,953.2) an attempt was under- 
taken for direct identification from the corresponding 
SELDI target plate. This involved the use of a special SELDI 
Chip target adapter (Brul<er Daltonics, Bremen, Germany) 



to analyze the spots with a matrix-assisted laser desorption/ 
ionization (MALDI)-TOF/TOF MS (Ultraflex 2, Bruker 
Daltonics, Bremen, Germany). Indeed, the TOF/TOF MS 
can induce fragmentation of selected masses, which is es- 
sential for their subsequent identification. However, SELDI- 
TOF MS is known for having a poor mass accuracy or re- 
producibility [21]. This made it difficult to determine which 
peak in the 2,650-2,750 and the 3,900-4,000 Da range on 
MALDI-TOF MS/MS was responsible for the 2,698.9 and 
3,953.2 peaks on SELDI-TOF. Moreover, collision induced 
dissociation (CID) of high mass peaks (>3 kDa) is difficult 
in currently available MALDI TOF/TOF MS instruments, 
yielding no or incomplete fragments from this masses. Al- 
ternatively, an off-line sample preparation was explored to 
allow analysis of larger volumes of samples using a MALDI 
target plate. In this project, SELDI-TOF MS on-chip chro- 
matographic surfaces are used to select proteins with either 
cationic or metal affinity properties. This gives two advan- 
tages to SELDI-TOF MS: (1) the chromatographic surface 
acts as an additional fractionation step, selecting only a sub- 
set of proteins that will be analyzed (enrichment), and (2) 
the proteins get separated from salts and other sample con- 
taminants by subsequent on-spot washing with appropriate 
buffer solutions. As in MALDI MS analysis, on-chip 
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Table 2 The total number of identified peaks and the number of peaks that was significantly differentially expressed 
for the given comparisons 



CM10 




IMAC 


Total 


Low mass 


High mass 1 


Low mass High mass 




<10 kDa 


>10kDa 


<10kDa >10kDa 




Total number of peaks 1 75 


109 


172 141 


597 


Lymph node status 








Negative vs Positive 2 


0 


5 5 


12 


Histological subtype 








Squamous ca. vs Adenoca. 8 


15 


3 5 


31 


Squamous ca. vs Adenosquamous ca. 4 


2 


4 1 


11 


Adenoca. vs Adenosquamous ca. 3 


0 


18 0 


21 


LVSI 








Negative vs Positive 18 


0 


14 5 


37 


Recurrence 








Negative vs Positive 4 


0 


7 3 


14 


Abbreviations: CM10 = weak cation exchanger array; IMAC = immobilized metal affinity capture array; ca. - 


= carcinoma; LVSI = lymph vascular space involvement. 


purification is not possible, sample cleanup procedures 


newly discovered biomarkers is ongoing 


7,. However, 


must be applied before the sample is put on the target to 


follow-up papers on the identified proteins. 


or validation 


reduce noise and ion suppression. In our identification 


studies are rarely published. For example, 


SELDI-TOF 


experiments we applied an additional desalting step by 


MS was used to differentiate cervical cancer 


and normal 


using revered phase chromatography, either by HPLC, or 


cervix tissue in 


the study by Wong et al. [22]. The 


by C4 or C18 Zip-Tip. These additional steps introduced 


authors were able to discover a discriminatory peak pro- 


additional experimental variables making it even more un- 


file with a sensitivity of 87% and a specificity of 100%. 


certain to identify the correct protein. Taken together, the 


To the best of 


our knowledge there was no follow-up 


additional sample preparations resulted in sample loss as 


study published 


in which these results were 


validated or 


weU as introducing qualitative and quantitative variances. 


the proteins identified. Another example is 


the study by 


without leading to the required identification. 


Lin et al. [23] in which plasma proteomic profiling with 


When looking at the literature on SELDI-TOF experi- 


SELDI-TOF MS 


was used to differentiate in situ carcin- 


ments, it can be noticed that in only a minority of 


oma and invasive carcinoma of the cervix. 


Although a 


papers an identification was performed. Most of the 


very high sensitivity and specificity was found with a 


papers mention that identification and validation of the 


limited amount of differentially expressed peaks, there 


Table 3 AUC obtained by leave-one-out internal validation (LOO) with the optimal median and mean number of peaks 


per iteration 








LOO AUC (SE) Sensitivity 


Specificity 


iVIedian number 


IVIean number 






of peaks per LOO 


of peaks per 






iteration 


LOO iteration 








(SD) 


Lymph node status 








Negative vs Positive 0.95 (0.03) 73.9 


91.7 


1 


1 (0) 


Histological subtype 








Squamous ca. vs adenoca. 0.88 (0.05) 88.2 


59.0 


1 


1 (0) 


Squamous ca. vs adsq ca. 0.85 (0.06) 84.6 


100 


4 


3.8 (0.9) 


Adenoca. vs adsq ca. 0.94 (0.06) 94.1 


100 


1 


0.9 (0.3) 


LVSI 








Negative vs Positive 0.81 (0.06) 78.1 


73.1 


1 


1 (0) 


Recurrence 








Negative vs Positive 0.92 (0.04) 79.2 


90.0 


3 


3 (0) 


Abbreviations: SE = standard error; SD = standard deviation; ca. = carcinoma; Adsq ca. = Adenosquamous carcinoma; LVSI = lymph vascular space involvement. 
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AUG 


= 0.947 - 






SE 
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= 0.030 . 



0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

1 - Specificity 

Figure 1 Receiver operating characteristics (ROC) curve for the prediction of lymph node status. Abbreviations: AUC: area under the 
curve, SE = standard error. 



were no follow-up studies published. Furthermore, this 
is not only the case for biomarker discovery studies for 
gynecological cancers [15], but also for various other 
types of cancer [24,25]. This questions the utility/ advan- 
tage of the using a SELDI-TOF MS approach. Over the 
last decade the field of mass spectrometry has evolved 
and expanded with new techniques: high-definition MS 
equipment and new software enables scientists to detect 
proteins up to the femtogram level. Future developments 
include tandem expansions with multiple connections to 
HPLC equipment. In-depth analyses of fluid or tissue 
specimens seems now possible. There is a place for a 
global proteomics approach, but this should be an in- 
depth proteomic profiling with high levels of fraction- 
ation, separation and identification. 

Conclusions 

In conclusion, the SELDI TOF MS approach has allowed 
to discover a set of proteomic profiles (revealing poten- 
tial biomarkers) that could help us in the diagnosis of 
LN metastases. However, the proteins/peptides con- 
cerned were not identified due to technical limitations of 
the SELDI-TOF MS technique. 

Material and methods 

Patients 

Serum samples of 60 cervical cancer patients were 
obtained before primary surgery. All patients were diag- 
nosed with FIGO stage I or II cervical cancer. Prior to en- 
rolment in the study, all patients were required to give 



fuUy informed consent. The protocol was approved by the 
Local Ethics Committee (reference: 3M040097/ML2524). 

Depletion 

For each of the 60 serum samples, immunodepletion 
was performed using a high capacity 4.6 x 100 mm mul- 
tiple affinity removal system (MARS) column (Agilent 
Technologies, Diegem, Belgium) in an Agilent 1200 high 
pressure liquid chromatography (HPLC) system (Agilent 
Technologies, Diegem, Belgium). This column elimi- 
nates the 14 most abundant proteins ubiquitously 
present in serum: albumin, alphal-acid glycoprotein, 
alpha2-macroglobulin, antitrypsin, apolipoprotein AI, 
apolipoprotein All, complement C3, fibrinogen, hapto- 
globin, IgA, IgG, IgM, transferrin, and transthyretin. In 
brief, the serum samples were diluted four-fold with Buf- 
fer A (Agilent Technologies, Diegem, Belgium), filtered 
through a 0.22 mm spin filter and 100 [i\ of the diluted 
serum was injected into the column in 100% Buffer A at 
a flow rate of 0.125 mL/min. After collection of the 
flow-through (i.e. depleted fraction) for 5.5 min, the col- 
umn was washed and the bound (high abundance) pro- 
teins were eluted with 100% Buffer B (Agilent 
Technologies, Diegem, Belgium) at a flow rate of 1 mL/min 
for 2.5 min. The column was re-equilibrated using 
100% Buffer A. Protein elution was monitored at a wave- 
length of 280 nm during the chromatography fraction- 
ation process. Reproducibility and efficiency of MARS 
column was checked by inspecting the peak position and 
height of the flow trough and eluted proteins as well as 
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Table 4 Most frequent selected peaks in the leave-one-out internal validation (LOO) iterations, with the corresponding 
chip surface and mass range 



Median m/z value 



Occurrence* 



p-value 



Chip surface 



Mass range^ 



Lymph node status 

Negative vs Positive 

2698.945 

15254.808 

3953.177 
Histological subtype 
Squamous ca. vs Adenoca. 

12802775 

78632414 
Squamous ca. vs Adenosquamous ca. 

1532.1 12 

1532.166 

1627.269 

4783.483 
Adenoca. vs Adenosquamous ca. 

1531.463 

LVSi 

Negative vs Positive 

1741.204 
3224.349 

Recurrence 

Negative vs Positive 

94029.326 

97177.269 

78294.986 

2044.703 

1979.514 



15 
14 
13 



28 
11 

39 
39 
39 
38 



58 
13 



53 
52 
52 
21 
15 



0.023 
0.022 
0.024 



0.021 
0.020 

0.032 
0.032 
0.032 
0.032 

0.012 



0.008 
0.010 



0.021 
0.018 
0.027 
0.031 

0.035 



IMAC 
IMAC 
CMIO 



CMIO 
IMAC 

CMIO 
CMIO 
IMAC 
IMAC 

CMIO 



IMAC 
CMIO 



IMAC 
IMAC 
IMAC 
IMAC 
IMAC 



Low 
High 

Low 



High 
High 

Low 

Low 
Low 
Low 

Low 



Low 
Low 



High 
High 
High 

Low 
Low 



* The number of times the peak was selected within the different LOO iterations. 
§ Low mass range: <10 kDa; high mass range >10 kDa. 

Abbreviations: CMIO = weak cation exchanger array; IMAC = immobilized metal affinity capture array; ca. = carcinoma; LVSI = lymph vascular space involvement. 



the overlay of the first and last chromatogram of every 
column using pooled serum samples as controls. 



Concentration and buffer exchange 

The collected flow-through fraction containing the low- 
abundant proteins was filtered using a 1,000 Da molecu- 
lar weight Microsep spin filter (Pall, Zaventem, Belgium) 
for the low molecular weight analysis and a 5,000 Da 
molecular weight Agilent spin filter (Agilent, Diegem, 
Belgium) for the high molecular weight analysis. After a 
first filtration step at 7500 x g for 100 and 30 min for 
the 1,000 and 5,000 Da spin filter, respectively, a fixed 
amount of the SELDI-TOF MS binding buffer (CMIO 
and IMAC binding buffers: see below for specifications) 
was added and the filtration step was repeated. This last 



step (adding buffer + filtration) was repeated three times 
to perform a buffer exchange from Buffer A to the 
SELDI-TOF MS binding buffers. The samples were then 
stored at -80°C until further use. 



Protein profiling with SELDI-TOF MS 

Fractions were analysed in duplicate on CMIO (weak cat- 
ion exchanger) and copper-coated IMAC30 (immobilized 
metal affinity capture) arrays (Bio-Rad, Nazareth, Belgium). 
All samples were randomly assigned to the different spots. 
For the CMIO arrays, spots were pre-incubated twice with 
CMIO binding buffer (0.1 M sodium acetate, pH 4.0) fol- 
lowed by application of 100 |il of the sample in the same 
binding buffer. For the IMAC30 arrays, spots were pre- 
incubated twice with 50 \A of 0.1 M copper sulphate for 
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5 min at room temperature followed by a wash step with 
0.1 M sodium acetate buffer pH 4 for 5 min at room 
temperature. Spots were then pre-incubated twice with 
IMAC30 binding buffer (0.1 M sodium phosphate, 0.5 M 
NaCl pH 7) followed by application of 100 [i\ of the sample 
in the same binding buffer. Samples were incubated for 
60 min at 4°C with shaking on a MicroMix (Siemens Med- 
ical Solutions Diagnostics, Brussels, Belgium). After three 
additional wash steps with the same binding buffer and 
two final washes with water, 2x1 |il of 20% a-cyano-4 
-hydroxy cinnamic acid (CHCA) or 100% sinapinic acid 
(SPA) (Bio-Rad, Nazareth, Belgium) dissolved in 1% TFA/ 
100% ACN were applied. CHCA was predominantly used 
to improve ionization for lower mass peaks (< 10,000 Da) 
and SPA for the high mass peal<s (10,000-100,000 Da). 
Mass analysis was performed using SELDI-TOF MS (PCS 
4,000 Enterprise, Ciphergen ProteinChip Reader Inc., Fre- 
mont, CA) applying automated data collection protocols 
for a molecular weight of < 10,000 Da (low molecular 
weight protocol) and for 10,000-100,000 Da (high molecu- 
lar weight protocol). The following settings were used: (a) 
sampling rate 400 MHz; (b) 2 warming shots (not included 
in analysis), 10 data shots per point and (c) total number of 
points evaluated equal to 12.5% of the spot surface. The 
low and high molecular weight protocols were further opti- 
mized in pilot studies (data not shown) to reach an optimal 
number of peaks and signal to noise (S/N) ratio (the max- 
imum number of peaks at S/N > 2 and S/N > 5 were 
counted per laser intensity). For the low molecular weight 
protocol a laser intensity of 2,500 nj; focus mass 5,000 Da; 
and matrix attenuation 500 Da was chosen. For the low 
molecular weight protocol a laser intensity of 2,500 nJ; 
focus mass 19,000 Da; and matrix attenuation 5,000 Da 
was chosen. Mass accuracy was calibrated externally using 
the all-in-one peptide and all-in-one protein standard 
according to the manufacturer's instructions (Bio-Rad) for 
the low and high molecular weight analysis, respectively. A 
quality control sample (pooled serum) was analyzed weekly 
to validate the output of the system. Pooled serum samples 
were also used as positive controls (one spot on every chip 
was randomly assigned) and run with the same protocol as 
the weekly control samples. Data analysis of the control 
samples was performed with Shewhart control charts plots 
[26]. The fulfillment of the following Westgard rules was 
checked: 1:3 s, 2:2 s, 4:1 s, lOx. The analysis of the quality 
control samples was within limits during the timeframe 
this study. Using the Ciphergen Express Software, baseline 
subtraction and noise reduction were completed before 
pealc intensities were normalized to the total ion current of 
the experimental samples. Outlier spectra were identified 
and removed from the analyses when the normalisation 
factor deviated more than 2 standard deviations. Numeric 
data were exported to csv-files for further biostatistical 
processing. 



Data analysis 

With the aid of MASDA software the following additional 
preprocessing steps were performed [27,28]: (1) peak de- 
tection based on changes in the first derivative of a sam- 
ple's intensity curve, (2) peak filtering with exclusion of 
peaks below a local noise threshold defined as the median 
plus five times the median absolute deviation, and (3) peak 
matching/ alignment across samples using complete link- 
age hierarchical one-dimensional clustering. The signifi- 
cance of peaks was determined with the non-parametric 
Wilcoxon rank sum test. A p-value of <0.05 was deemed 
significant. 

Weighted Least Squares Support Vector Machine 
(LSSVM) in combination with leave-one-out (LOO) cross- 
validation was used to build classifiers [29,30]. For the 
optimization of number of peaks included in the classifiers, 
the number of peaks tested within each LOO iteration ran- 
ged from 1 to maximum 10, only including significant 
peaks (p < 0.05). For both CMIO and IMAC30, the low 
mass and high mass peaks were simultaneously included in 
the models in order of decreasing significance. The optimal 
model parameter (regularization parameter of the weighted 
LSSVM) was chosen as the one corresponding to the lar- 
gest area under the curve (AUC) of the receiver operating 
characteristic curve. When multiple parameters with the 
same AUC were present, the balanced error rate was mini- 
mized with an as high as possible sum of sensitivity and 
specificity. The main outcome was LN status (negative vs 
positive). Secondary outcomes were histological subtype, 
lymph-vascular space involvement (LVSI) and recurrent 
disease. 
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